## LAB 2 EXERCISE

a.) Show the timing of this instruction sequence for the RISC pipeline without any forwarding or bypassing hardware but assuming a register read and a write in the same clock cycle "forwards" through the register file. Assume that the branch is handled by flushing the pipeline. If all memory references take 1 cycle, how many cycles does this loop take to execute?

#### **Assumptions:**

- No forwarding/bypassing.
- Register read/write in same cycle forwards through register file.
- Branch resolved in ID, flush IF and ID on taken branch (2-cycle penalty).
- Stalls due to data hazards.

Pipeline Table (first iteration, subsequent similar):

| Inst          | 1      | 2      | 3      | 4      | 5      | 6      | 7      | 8      | 9      | 1      | 1      | 12     | 13     | 1 | 1 | 16 | 17 | 1 | 1 | 2 |
|---------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|---|---|----|----|---|---|---|
| IIISC         |        |        |        |        |        |        |        |        |        | 0      | 1      |        |        | 4 | 5 |    |    | 8 | 9 | 0 |
| LD            | I<br>D | I<br>D | E<br>X | M<br>E | W<br>B |        |        |        |        |        |        |        |        |   |   |    |    |   |   |   |
| DADD          |        | I<br>F | I<br>D |        |        | I<br>D | E<br>X | M<br>E | W<br>B |        |        |        |        |   |   |    |    |   |   |   |
| I             |        |        |        |        |        |        | ~      |        |        |        |        |        |        |   |   |    |    |   |   |   |
| SD            |        |        |        | IF     |        |        | I D    | EX     | M<br>E | W<br>B |        |        |        |   |   |    |    |   |   |   |
| DADDI(R<br>2) |        |        |        |        | IF     |        |        | ID     | EX     | M<br>E | W<br>B |        |        |   |   |    |    |   |   |   |
| DSUB          |        |        |        |        |        |        |        |        | ID     | E<br>X | M<br>E | W<br>B | W<br>B |   |   |    |    |   |   |   |

| BNEZ  |  |  |  | IF |  | ID | E<br>X | M<br>E |    |        |        |        |        |  |  |
|-------|--|--|--|----|--|----|--------|--------|----|--------|--------|--------|--------|--|--|
| LD(ne |  |  |  |    |  |    |        |        | IF | I<br>D | E<br>X | M<br>E | W<br>E |  |  |
| xt)   |  |  |  |    |  |    |        |        |    |        |        | _      |        |  |  |

#### - Stalls:

- `DADDI`: 2 stalls (cycles 4–5) waiting for R1 from `LD` (WB cycle 5).
- `SD`: 1 stall (cycle 6) waiting for R1 from `DADDI` (WB cycle 8).
  - `DADDI(R2)`: 1 stall (cycle 9) due to prior stalls.
  - `DSUB`: 2 stalls (cycles 11–12) due to prior stalls.
- `BNEZ`: 1 stall (cycle 14) waiting for R4 from `DSUB` (WB cycle 15).
- `LD` (next): 3 stalls (cycles 14–16) waiting for R2 from `DADDI(R2)` (WB cycle 12).
- Branch taken: 2-cycle flush (cycles 16–17).
- Cycles per iteration: 17 cycles (6 instructions + 9 stalls + 2 flush).
- Total cycles: \( 99 \times 17 + 15 \) (final iteration, no flush) = 1698 cycles.

b.) Forwarding, Predict Branch Not Taken

## **Assumptions:**

- Normal forwarding/bypassing.
- Branch predicted not taken, flush IF/ID if taken (2-cycle penalty).

# Pipeline Table:

| Inst          | 1      | 2      | 3      | 4      | 5      | 6      | 7      | 8      | 9      | 1      | 1      | 1 2    | 1      | 1 4    | 1<br>5 | 1<br>6 | 1<br>7 | 1 | 1<br>9 | 2 |
|---------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|---|--------|---|
| LD            | l<br>F | I<br>D | E<br>X | M<br>E | W<br>B |        |        |        |        |        |        |        |        |        |        |        |        |   |        |   |
| DADDI         |        | l<br>F | I<br>D | E<br>X | M<br>E | W<br>B |        |        |        |        |        |        |        |        |        |        |        |   |        |   |
| SD            |        |        | I<br>F | I<br>D | E<br>X | M<br>E | W<br>B |        |        |        |        |        |        |        |        |        |        |   |        |   |
| DADDI(<br>R2) |        |        |        | IF     | I<br>D | E<br>X | M<br>E | W<br>B |        |        |        |        |        |        |        |        |        |   |        |   |
| DSUB          |        |        |        |        | IF     | ID     | E<br>X | M<br>E | W<br>B |        |        |        |        |        |        |        |        |   |        |   |
| BNEZ          |        |        |        |        |        | IF     | ID     | E<br>X | M<br>E | W<br>B |        |        |        |        |        |        |        |   |        |   |
| LD<br>(next)  |        |        |        |        |        |        |        | IF     |        |        | I<br>D | E<br>X | M<br>E | W<br>B |        |        |        |   |        |   |

- Stalls:

- `LD` (next) stalls 2 cycles (cycles 9–10) waiting for R2 from `DADDI(R2)` (EX cycle 8).
  - Branch taken: 2-cycle flush.
- Cycles per iteration: 10 cycles (6 instructions + 2 stalls + 2 flush).
- Total cycles: \( 99 \times 10 + 8 \) (final iteration, no flush) = 998 cycles.
- c.) Assume the RISC pipeline with a single-cycle delayed branch and normal forwarding and bypassing hardware. Schedule the instructions in the loop including the branch delay slot. You may reorder instructions and modify the individual instructions operands, but do not undertake other loop transformations that change the number or opcode of the instructions in the loops. Show a pipeline timing diagram and compute the number of cycles needed to execute the entire loop

## **Assumptions:**

- Single-cycle delayed branch (execute delay slot).
- Normal forwarding.
- Reorder instructions, modify operands, same opcodes/number.

Reordered Loop:

loop:

LD R1, 0(R2)

**DADDI R1, R1, 1** 

SD 4(R2), R1

**DADDI R2, R2, 4** 

**DSUB R4, R3, R2** 

BNEZ R4, loop

LD R1, 4(R2); delay slot

## Pipeline table

| Inst          | 1 | 2 | 3  | 4  | 5       | 6  | 7      | 8       | 9 | 1      | 11  | 1 | 1 3 | 1 | 1<br>5 | 1<br>6 | 1 | 1<br>8 | 1<br>9 | 2 |
|---------------|---|---|----|----|---------|----|--------|---------|---|--------|-----|---|-----|---|--------|--------|---|--------|--------|---|
|               | - | 1 | E  | М  | W       |    |        |         |   | 0      |     | 2 | 3   | 4 | 3      | 0      |   | 0      | Э      |   |
| LD            | F | D | X  | E  | В       |    |        |         |   |        |     |   |     |   |        |        |   |        |        |   |
|               |   |   |    |    |         |    |        |         |   |        |     |   |     |   |        |        |   |        |        |   |
| DADDI         |   | I | I  | Е  | М       | W  |        |         |   |        |     |   |     |   |        |        |   |        |        |   |
|               |   | F | D  | Х  | Е       | В  |        |         |   |        |     |   |     |   |        |        |   |        |        |   |
| SD            |   |   | IF | ID | E       | М  | W      |         |   |        |     |   |     |   |        |        |   |        |        |   |
|               |   |   |    | IF | X<br>ID | E  | B<br>M | W       |   |        |     |   |     |   |        |        |   |        |        |   |
| DADDI(        |   |   |    | II | טו      | X  | E      | В       |   |        |     |   |     |   |        |        |   |        |        |   |
| R2)           |   |   |    |    |         |    |        |         |   |        |     |   |     |   |        |        |   |        |        |   |
| DSUB          |   |   |    |    | IF      | ID | Ε      | М       | W |        |     |   |     |   |        |        |   |        |        |   |
|               |   |   |    |    |         |    | Χ      | Е       | В |        |     |   |     |   |        |        |   |        |        |   |
| BNEZ          |   |   |    |    |         | IF | ID     | E       | М | W<br>B |     |   |     |   |        |        |   |        |        |   |
|               |   |   |    |    |         |    | IF     | X<br>ID | E | М      | W   |   |     |   |        |        |   |        |        |   |
| LD            |   |   |    |    |         |    | II     | טי      | X | E      | B B |   |     |   |        |        |   |        |        |   |
| LD<br>(delay) |   |   |    |    |         |    |        |         | ^ | L      | D   |   |     |   |        |        |   |        |        |   |

- Stalls: None, as forwarding and scheduling resolve hazards.
- Cycles per iteration: 7 cycles (6 instructions + 1 delay slot).

- Total cycles:  $\ \ \ 100 \times 7 = 700 \ )$  cycles.